Semi-Supervised Learning Based on Semiparametric Regularization
نویسندگان
چکیده
Semi-supervised learning plays an important role in the recent literature on machine learning and data mining and the developed semisupervised learning techniques have led to many data mining applications in recent years. This paper addresses the semi-supervised learning problem by developing a semiparametric regularization based approach, which attempts to discover the marginal distribution of the data to learn the parametric function through exploiting the geometric distribution of the data. This learned parametric function can then be incorporated into the supervised learning on the available labeled data as the prior knowledge. Specifically, our contributions are: (1) We present a semi-supervised learning approach which incorporates the unlabeled data into the supervised learning by a parametric function learned from the whole data including the labeled and unlabeled data. The parametric function reflects the geometric structure of the marginal distribution of the data. Furthermore, the proposed approach which naturally extends to the out-of-sample data is an inductive learning method in nature. (2) This approach allows a family of algorithms to be developed based on various choices of the original RKHS and the loss function. (3) We provide experimental comparisons showing that the proposed approach leads the state-of-the-art performance on a variety of classification tasks. In particular, we demonstrate that this approach can be used successfully in both transductive and semisupervised settings.
منابع مشابه
Semi-supervised Regression with Order Preferences
Following a discussion on the general form of regularization for semi-supervised learning, we propose a semi-supervised regression algorithm. It is based on the assumption that we have certain order preferences on unlabeled data (e.g., point x1 has a larger target value than x2). Semi-supervised learning consists of enforcing the order preferences as regularization in a risk minimization framew...
متن کاملInteractive Segmentation in Multimodal Medical Imagery using a Bayesian Transductive Learning Approach
Labeled training data in the medical domain is rare and expensive to obtain. The lack of labeled multimodal medical image data is a major obstacle for devising learning-based interactive segmentation tools. Transductive learning (TL) or semi-supervised learning (SSL) offers a workaround by leveraging unlabeled and labeled data to infer labels for the test set given a small portion of label info...
متن کاملSERBoost: Semi-supervised Boosting with Expectation Regularization
The application of semi-supervised learning algorithms to large scale vision problems suffers from the bad scaling behavior of most methods. Based on the Expectation Regularization principle, we propose a novel semi-supervised boosting method, called SERBoost that can be applied to large scale vision problems. The complexity is mainly dominated by the base learners. The algorithm provides a mar...
متن کاملStatistical Analysis of Semi-Supervised Regression
Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regulari...
متن کاملTransductive Classification via Dual Regularization
Semi-supervised learning has witnessed increasing interest in the past decade. One common assumption behind semi-supervised learning is that the data labels should be sufficiently smooth with respect to the intrinsic data manifold. Recent research has shown that the features also lie on a manifold. Moreover, there is a duality between data points and features, that is, data points can be classi...
متن کامل